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Abstract 

The dynamical evolution of many economic, sociological, biological and physi- 
cal systems tends to be dominated by a relatively small number of unexpected, 
large changes ('extreme events'). We study the large, internal changes pro- 
duced in a generic multi-agent population competing for a limited resource, 
and find that the level of predictability actually increases prior to a large 
change. These large changes hence arise as a predictable consequence of in- 
formation encoded in the system's global state. 
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Populations comprising many 'agents' (e.g. people, species, data-packets, cells) who com- 
pete for a limited resource, are believed to underlie the complex dynamics observed in areas 
as diverse as economics sociology 0], internet traffic p, ecology |^ and biology [H,^. 

The reliable prediction of large future changes ('extreme events') in such complex systems 
would be of enormous practical importance, but is widely considered to be impossible |]rU| . 



In this paper, we examine the predictability of large future changes produced within an 
evolving population of agents who compete for a limited resource. We find that the level of 
predictability in the system actually increases prior to a large change. The implication is 
that such a large change arises as a predictable consequence of information encoded in the 
system's global state, as opposed to being triggered by some isolated random event. 

We consider a generic multi-agent system comprising a population of Ntot agents where 
only a maximum of L < Nfot agents can be winners at each timestep; an everyday example 
would be a popular bar with a limited seating capacity L ^ . For the purpose of this paper, 
we consider a specific case of such a limited- resource problem with L = {Ntot — 1)/2 with Ntot 
being odd |TT|, hence there are more losers than winners. We note that similar dynamics 



can also occur for more general L values Each agent is therefore seeking to be in the 
minority group: for example, a buyer in a financial market may obtain a better price if 
more people are selling than buying; a driver may have a quicker journey if she chooses the 
route with less traffic. At each timestep, an agent decides whether to enter a game where 
the choices are option (e.g. buy, choose route A) and option 1 (e.g. sell, choose route B). 
Each agent holds a finite number of strategies and only a subset = A^^q + A^^i < Ntot of 
the population, who are sufficiently confident of winning, actually play: A'o agents choose 
while A^i choose 1. If Nq — Ni > 0, the winning decision (outcome) is '1' and vice versa. If 
Aq = A^i the tie is decided by a coin-toss. Hence A^ and the 'excess demand' A"o_i = Nq — Ni 
both fluctuate with time. In contrast to the basic Minority Game (MG) |Tl[], this variable- A^ 
model has the realistic feature of accounting for agents' confidence |T3| , p!^ . Furthermore the 
variable- A^ model can be used to generate statistical and dynamical features similar to those 
observed in financial markets (archetypal examples of complex systems) [|l3i3. Therefore, 



demonstration of predictability of extreme events in the present multi-agent model would 
open up the exciting possibility of predictability of extreme events in real-world systems. 

The only global information available to the agents is a common bit-string 'memory' of 
the m most recent outcomes. The agents can thus be said to exhibit 'bounded rationality' . 
Consider m = 2; the 2"^ = 4 possible history bit-strings are 00,01, 10 and 11. A strategy 
consists of a response, i.e. or 1, to each possible bit-string; hence there are 2^™ = 16 
possible strategies. At the beginning of the game, each agent randomly picks q strategies 
and after each turn assigns one (virtual) point to a strategy which would have predicted 
the correct outcome. Agents have a time horizon T over which virtual points are collected 
and a threshold probability level r; strategies with a probability of winning greater than 
or equal to r, i.e. having > Tr virtual points, are available to be used by the agent. We 
call these active strategies. Agents with no active strategies within their individual set of 
q strategies do not play at that timestep. Agents with one or more active strategies play 
the one with the highest virtual point score; any ties between active strategies are resolved 
using a coin-toss. The 'excess demand' Nq_i, which can be identified as the output from 
the model system, can be expressed as 



where Sj is the prediction of the i-th strategy, e.g. or 1, and Xi is the number of agents using 
this strategy, with the summation taken over the set of active strategies at that timestep. 

Because of the feedback in the game, any particular strategy's success is short-lived. If 
all the agents begin to use similar strategies, and hence make the same decision, such a 
strategy ceases to be profitable. The game can be broadly classified into three regimes, (i) 
The number of strategies in play is much greater than the total available: groups of traders 
will play using the same strategy and therefore crowds should dominate the game |]T5[. (ii) 
The number of strategies in play is much less than the total available: grouping behaviour 
is therefore minimal, (iii) The number of strategies in play is comparable to the total num- 
ber available: this represents a transition regime and is of most interest, since it produces 
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seemingly random dynamics with occasional large movements. Even if complete knowledge 
of the state of the game were available at any timestep, it seems impossible that subsequent 
outcomes should be predictable with significant accuracy since the coin-tosses which are 
used to resolve ties in decisions (i.e. Nq=Ni) and active-strategy scores inject stochasticity 
into the game's time-evolution. Remarkably, however, we find that large changes over sev- 
eral consecutive timesteps can be predicted with surprising accuracy without any detailed 
knowledge of the game itself. 

Suppose we are given an analogue time series H(t) generated by a physical, sociological, 
biological or economic system, e.g. a financial market |jl3|, whose dynamics are well-described 



by the multi-agent game for a fixed unknown parameter set m, N, r, T and an unknown 
specific realization of initial strategy choices. We call this our 'black-box' game. Our goal 
is to identify 'third-party' games which can be used to predict large future changes in H{t), 
where AH(t) is defined to be directly proportional to the excess demand Nq^i. For example, 
AH{t) could be the price change in a financial market, or may instead be a quantity which 
is derived from the system output using a known non-linear function. For the remainder of 
this article, we focus on the following game parameters for the 'black-box' game: N = 101, 



m = 3, q = 2, T = 100, r = 0.53, although our conclusions are more general []TB|. Since 
r > 0.5 an agent will not participate unless she believes she has a better than average chance 
of winning. 

We start by running H{t) through a trial third-party game in order to generate an 
estimate of Sq and at each timestep, the number of active strategies predicting a or 1 
respectively. This is obtained from the strategy space, or the pool of all available strategies 
in the third party game, and is independent of the distribution of agents. We wish to predict 
AH{t), i.e. Nq_i ; we will do this by linking S and through an appropriate probability 
distribution. Provided the strategy space in the black-box game is reasonably well covered 
by the agent's random choice of initial strategies, any bias towards a particular outcome 
in the active strategy set will propagate itself as a bias in the value of A^o-i away from 
zero. Thus we expect Nq^i to be approximately proportional to Sq — Si = 5*0-1. This is 



equivalent to assuming an equal weighting Xi on each strategy in Eq. (|TD , indicating the exact 



distribution of strategies among the individual agents is unimportant in this regime |T^. In 
addition, the number of agents taking part in the game at each timestep will be related 
to the total number of active strategies 5*0 + 5*1 = 5o+i, hence the error (i.e. variance) in 
the prediction of Nq_i using 5*o_i will be dependent on 5*o+i. Based on extensive statistical 



analysis of known simulations for the multi-agent game ||T^, we have confirmed that it is 
reasonable to model the relationship by 

No-i = bSo-i+e[0,f{So+i)] 

where e is a noise term with mean zero and variance proportional to 5*0+1, and 6 is a 
constant. In particular, A^o-i is well described by a Normal distribution of the form A^o-i ~ 
N(65*o-i, c5*o+i), where c is a constant. The variance of our forecast density function can be 
minimized by choosing a third-party game that achieves the maximum correlation between 
A^o-i and our explanatory variable 5o_i, with the unexplained variance being characterized 
by a linear function of 5*o+i. We focus on the parameter regime known to produce realistic 
statistics (e.g. fat-tailed distribution of returns in financial markets). Within this parameter 
space we run an ensemble of third-party games through the black-box series H{t), calculating 
the values of 5*o-i from the reconstructed strategy space. We then identify the configuration 
that achieves the highest correlation between 5o_i and A^o_i produced by the original black- 
box game. As shown in Fig. ^ the third-party game that achieves the highest correlation is 
the one whose parameters coincide with the black-box game. From a knowledge of just H{t), 
and hence iVo_i, we have therefore used next-step prediction to recover all the parameters 
of relevance to produce a 'model' game for prediction purposes. 

We now extend this forecast to an arbitrary number j of timesteps into the future, in 
order to address the predictability of large changes in H{t) arising over several consecutive 
timesteps. This is achieved by calculating the net value of 5o-i along all the k = 2^^^ 
possible future routes of the third party game, weighted by appropriate probabilities. In 
order to assign these probabilities, it is necessary to calculate all possible 5*o_i values in the 



next j timesteps. This is possible since the only data required to update the strategy space 
between timesteps is knowledge of the winning decision, and hence the third party game can 
be directed along a given path independent of the predictions of the individual agents in the 
black box game. The change in A^o-i along a path indexed by k is given by a convolution 
of the predictions over the j individual steps 



N(/Xfc,crfc) ~ N (b^So^i,c^So^ 
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where the summation is taken along the path represented by k. In general, the pdf for the 
change in A^o-i during the next j timesteps is a mixture of Normals: 

P[AiVo_i(«;2 + j)] = E PfeN(/ife,afc), (2) 

fc=0 

where pk is the probability of path k being taken. 

To test the validity of the density forecast, we perform a statistical evaluation using the 
realized variables. The one-step-ahead forecasts are normal distributions, and we define the 
test statistic Zi as 

where /i^ and af are the mean and variance of the forecast distribution, and Xi is the re- 
alised value of Nq_i at the timestep i. The Zj were found to be independent uniform A^(0, 1) 
variates for 1000 out-of-sample predictions, confirming that the predicted distributions are 
correct. To compare the forecasts to a naive 'no-change' prediction, we calculate the Theil 



coefficient [|18| which is the sum of squared prediction errors divided by the sum of squared 
errors resulting from the naive forecast. A coefficient of less than one implies a superior 
performance compared to the naive prediction; calculated values were typically in the re- 
gion of 0.4. There is no accepted method in the literature for evaluating multi-step-ahead 
forecasts [|T^. However, the density function for an arbitrary time horizon is a mixture of 
Normal distributions, see Eq. (^, each of which can be roughly characterised in terms of a 
single mean and variance: 
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n / " \ ^ 

Var[X] = J2Pi(^i + /^i ) - [Y^Pit^i 

i=l \i=l I 

Hence the same test statistic as Eq. (|) can be calculated. Again, the predictions were found 
to be reliable. 

Given that we can derive accurate distributions for the future changes in these 
will be of most practical interest in situations where there is likely to be a substantial, 
well-defined movement. We characterise these moments by seeking distributions with a high 
value of and a low value of a at a future timestep, or over a specified time horizon. In 
Fig. we plot vs. a for a number of separate forecasts, and take a fraction of points that 
are furthest from the average trend indicated by the regression line, i.e. we are interested 
in the outliers. The point with the highest residual is thus a candidate for the game to 
be in a highly predictable phase. We call these time periods 'predictable corridors, since 
comparatively tight confidence intervals can be drawn for the future evolution of the excess 
demand, a typical example of which is shown in Fig. ^ We subject these points to an 
identical test as described earlier to ensure these potential outliers are well described by our 
probability distributions, and this is found to be true. 

We performed extensive numerical simulations to check the validity of these predictive 
corridors |T6[. Our procedure is to take a sample of 5000 timesteps, then fit parameters 
using the first 3000 steps. We then look at the largest changes (extreme events) in our 
out-of-sample region. Extreme events are ranked by the largest movements in H{t) over 
a given window size W. Hence we consider the top twenty extreme events and calculate 
the probability integral transform Zt of the realized variables with respect to the forecast 
densities. The Zt are found to be approximately uniform U[0,1] variates, confirming that the 
forecast distribution is essentially correct - see supplementary material for full details. About 
50% of large movements occur in periods with tight predictable corridors, i.e. a large value 
of |/i|/cr. Both the magnitude and sign of these extreme events are therefore predictable. 
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The remainder correspond to periods with very wide corridors. Although the magnitude of 
the future movement is now uncertain, the present method predicts with high probabihty 
the actual direction of change. Even this more limited information would be invaluable for 
assessing future risk in the physical, economic, sociological or biological system of interest. 
Our predictions generated from the third-party game were consistent with all such extreme 
changes in the actual (black-box) time series H{t). Finally we note that some empirical 
support for our claim of enhanced predictability prior to extreme movements, has very 
recently appeared for the case of financial markets pO |. 

We are very grateful to Michael Hart, Paul Jefferies, Pak Ming Hui and Jeff Dewynne 
for their involvement in this project. D.L. thanks EPSRC for studentship. 
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FIG. 1. Estimation of the parameter set for the black-box game. The correlation between 
Nq^i and 5*0-1 is calculated over 200 timesteps for an ensemble of candidate third-party games. 
The third-party game that achieves the highest correlation is the one with the same parameters as 
the black-box game. 




FIG. 2. A plot of vs. a for 500 separate four-step density forecasts. Items marked by "x" 
are forecasts with an unusually large value of \fi\/a. At these moments, the game is likely to be in 
a highly predictable phase. 
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FIG. 3. Comparison between the forecast density function and the reahsed time series H(t) 
for a typical large movement. The large, well-defined movement is correctly predicted. 
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